Psychological & Behavioral distress of COVID-19 & Infodemics

TEAM MEMBERS

  • Madhuri Sajith
  • Usama Ashfaq
  • Vishnu Jayanand
  • Sujith Nyarakkad Sudhakaran
  • Ranjiraj Rajendran Nair

1 Overview and Motivation

The coronavirus COVID-19 pandemic is an unprecedented health crisis that has impacted the world to a large extent. According to WHO, mental disorders are one of the leading causes of disability worldwide, so considering that this pandemic has caused further complications to mental ailment. The stress, anxiety, depression stemming from fear, isolation, and stigma around COVID-19 affected all of us in one way or another. We could see that many people are losing their jobs and the elderly are isolated from their usual support network. The issue is that the effects of the pandemic and mental health, maybe for longer-lasting than the disease itself.

In this limelight, although the measures are taken to slow the spread of the virus, it has affected our physical activity levels, our eating behaviors, our sleep patterns, and our relationship with addictive substances, including social media. Into this last point, both our increased use of social media while stuck at home, as well as the increased exposure to disaster news past year, have amplified the negative effects of social media on our mental health. This motivates us to perform some diagnostic analysis of this pattern and portray some meaningful insights on global survey data.

3 Initial Questions

As from our motivation, we decided to split our task into 3 mainstream objectives: Impact of distress level on global level, Twitter, Infodemics. Commencing from a basic questionnaire analysis we dive deep into streamline platforms to look into from where people seek information and how they perceive and assimilate within themselves and share to others which affects their productivity to a large extent.

3.0.1 How many participants were part of this global survey despite this hard time?

The COVIDiSTRESS global survey is an international collaborative undertaking for data gathering on people’s experiences, behavior and attitudes during the COVID-19 pandemic. In particular, the survey focuses on psychological stress, compliance with behavioral guidelines to slow the spread of Coronavirus, and trust in governmental institutions and their preventive measures, but multiple further items and scales are included for descriptive statistics, further analysis and comparative mapping between participating countries. Data is being collected by committed researchers in 43+ countries. Data is collected through online means, mainly based on social snowballing, viral spread and help from interested partners including media. We perform a basic descriptive statistics to analyze the trend of the survey and report the how across different countries the participants have coordinated to this task.

3.0.2 How was Twitter serving during this distress period?

A 2018 study conducted by MIT researchers and published in Science discovered that false stories on Twitter diffused quicker and more widely than truths. The study analyzed millions of tweets and concluded that the novelty and the emotional reactions of Twitter users may be a contributing factor. With that in mind, pause a beat before clicking on the share button. Every social media post should be viewed with skepticism prior to conducting your own fact checking algorithm. Do facts and figures include referenced sources? If so, is the data reliable? Is the data clearly explained and are the limitations addressed? Beware of false equivalency – comparing apples with pears. A good example of this is the volume of graphs currently being shared that compare countries with vastly different population densities. We plan to put forth a comparative analysis of twitter during this pandemic for 2020 and 2021.

3.0.3 Whom should we trust and whom should we not during this pandemic?

We are the channels that spread the infodemic throughout our networks. If in any doubt of the validity of a message, even if forwarded from one person to another it is blindly passed on. These messages are designed to mimic inside information or a scoop on what is really happening that are implicitly given a fake seal of validity from a close contact. The government have been very upfront with sharing information as quick and reliably as possible. Its imperative that we do not get side tracked by word-of-mouth messages from loosely connected official sources. It takes less than a second to loose our concentration and hit the forward button giving the infodemic further fuel. Even if you think your network are big and strong enough to make up their own mind, there may be people more at risk to these types of malicious messages. Just as with the pandemic, less digitally fluent people find it hard to decipher such messages causing undue stress and harm. This motivates us to pose this clear and strong question to make a trust graph based on all popular communication channels to assess their Infodemic Risk Index (IRI) scores.

4 Data

4.0.1 COVIDISTRESS all global survey data

(The COVIDiSTRESS global survey is an open science collaboration, created by researchers in over 40 countries to rapidly and organically collect data on human experiences of the Coronavirus epidemic 2020.) Dataset can be downloaded here: (Andreas Lieberoth 2020) https://osf.io/z39us/

4.0.2 Twitter Data

We aim to work on the most recent dataset aggregated from Twitter using twitteR and rtweet libraries within a particular time and location.

Here twitteR which provides an interface and access to Twitter web API respectively, rtweet which acts as the client for Twitter’s REST and stream APIs will be used to retrieve data.

Data scraping techniques we heavily part of this like removing stop words, emoji’s, cryptic characters and also text conversion to lowercase to maintain semantic integrity.

4.0.3 COVID-19 Infodemics Observatory

(The Role of Trust and Information during the COVID-19 Pandemic and Infodemic) Dataset can be downloaded here: [R. Gallotti, N. Castaldo, F. Valle, P. Sacco and M. De Domenico, COVID19 Infodemics Observatory (2020). DOI: 10.17605/OSF.IO/N6UPX] [Van Mulukom, V. (2021, May 15). The Role of Trust and Information during the COVID-19 Pandemic and Infodemic. https://doi.org/10.17605/OSF.IO/GFWBQ] https://osf.io/n6upx/, https://osf.io/67zhg/, https://osf.io/rtacb/, https://osf.io/dh879/, https://osf.io/c37wq/

These datasets comprises of summary of infodemics data collected from across countries, the world risk index, population emotional state, and news reliability.

( All the above listed datasets can be accessed via our Github repository linked at the footer of this notebook. )

5 Exploratory Data Analysis

5.1 IRI world evolution

First we analyze how the IRI took form across the world during the first quarter of 2020.

RESOURCES_DIR_PATH <- getwd()

INFODEMIC_REDUCED_FILE_PATH <- file.path(RESOURCES_DIR_PATH, "infodemics_reduced.csv")
WORLD_RISK_INDEX_FILE_PATH <- file.path(RESOURCES_DIR_PATH, "world_risk_index.csv")
dat.red <- read.table(INFODEMIC_REDUCED_FILE_PATH, header = T, sep = ";" )
dat.red$date <- as.Date(dat.red$date)
dat.red

This dataset has the iso3 country code and the volume of the infodemic score spread across continents which we find is very vital for extracting specific insights to get an idea of how the IRI has evolved across the timeline.

dat.iri.world <- read.table(WORLD_RISK_INDEX_FILE_PATH, header = T, sep = ";")
dat.iri.world

This dataset has the world risk index for each day in 2020. We extract these data and use for our analysis which will be shown in a timeline with a focus on the risk index.

5.1.1 IRI analysis by countries

Here we plan to see how the behavior of IRI has evolved across the timeline and its change in wavelength to see how periodically the cycle repeats.

COUNTRY <- "ITA"
#COUNTRY <- c("ITA","VEN")
dat.red.country.tmp <- dat.red[
  which(dat.red$iso3 == COUNTRY), 
  c("date", "IRI_UNVERIFIED", "IRI_VERIFIED")
]

dat.red.country <- data.frame(
  date = dat.red.country.tmp$date, 
  Unverified = (dat.red.country.tmp$IRI_UNVERIFIED),
  Verified = (dat.red.country.tmp$IRI_VERIFIED)
)

tmp.cum.mean <- dplyr::cummean(rowSums(dat.red.country[order(dat.red.country$date), 2:3]))

dat.red.country <- melt(dat.red.country, id.vars = "date")

dat.red.country.epi <- data.frame(
  date = dat.red[which(dat.red$iso3 == COUNTRY), ]$date, 
  epi.new = c(0, diff(dat.red[which(dat.red$iso3 == COUNTRY), ]$EPI_CONFIRMED)) 
)

dat.red.country.epi[which(dat.red.country.epi$epi.new == 0), ]$epi.new <- NA

dat.red.country.cummean <- data.frame(
  date = dat.red[which(dat.red$iso3 == COUNTRY), ]$date, 
  Cum.Mean = tmp.cum.mean
)

  ggplot() + 
  theme_bw() + 
  theme(panel.grid = element_blank()) + 
  geom_point(data = dat.red.country.epi, 
             aes(date, size = epi.new), 
             y = 0.9, alpha = 0.5, 
             color = "tomato") + 
  geom_histogram(data = dat.red.country, 
                 aes(date, value, fill = variable), 
                 stat = "identity", 
                 type = "stacked", 
                 position = position_stack(reverse = TRUE)) + 
  scale_fill_manual(name = "", 
                    values = c('#4DBBD5FF', '#3C5488FF')) + 
  ylab("IRI") + 
  xlab("Timeline")  +
  ylim(c(0, 1)) + 
  guides(size = guide_legend(title = "New Cases")) + 
  geom_text(data = dat.red.country.epi, 
            aes(x = date, label = epi.new), 
            angle = 90, 
            y = 0.97, 
            size = 2, 
            color = "grey30") + 
  geom_line(data = dat.red.country.cummean, 
            aes(date, Cum.Mean), 
            linetype = "dashed", 
            color = "grey30")+
    ggtitle("IRI scores in Italy")

It is very important to note that in the early 2020 there are extremely very large volume of new cases that caused outbreak of this havoc starting from March and it increases by 500 cases per day. Also it is salient that a large number of potential unverified cases is noticeable during the entire timeline.

It is surprising to see the absence of new cases in Venezuela but the cumulative mean is very high for all the months and is 0.87 on 17-02-2020.

5.1.2 IRI effect

For the first quarter of 2020 from mid of January to mid of March we analyze its effects across different countries.

library(RColorBrewer)

col <- brewer.pal(9, "YlGnBu") 

idxs.sub <- which(dat.red$TWI_VOLUME > 2000 & dat.red$EPI_CONFIRMED > 100)
country.sub <- as.character(unique(dat.red[idxs.sub, ]$iso3))
dat.red.sub <- dat.red[which(dat.red$iso3 %in% country.sub), 
                       c('date' ,'iso3', 'IRI_ALL')]

ggplotly(ggplot(dat.red.sub, aes(x = date, 
                                 y = reorder(iso3, IRI_ALL), 
                                 fill = IRI_ALL)) + 
  geom_tile() +
  theme_bw() +
  scale_fill_gradientn(colors = col, limits = c(0, 1), name = "IRI") +
  theme(panel.background = element_blank(), 
        panel.grid.major = element_blank(), legend.position = 'top') +
  ylab('Country') +
  xlab('Timeline') +
  geom_hline(yintercept = c(seq(1.5, 21, 1)), color = 'grey70') +
  scale_x_date(expand = c(0, 0))+ggtitle("IRI scores across 20 different countries"))

It can be seen that in countries like Iranthe IRI score very pretty high as compared to few other European countries from second half of January to first half of March 2020.

5.1.3 IRI reduction

We assess and compare the cumulative number of reported cases which are grouped into 6 different bins for all countries.

dat.corr2 <- data.frame()
dat.corr <- dat.red[, c("date", "iso3", "EPI_CONFIRMED", "IRI_ALL")]

for(cc in unique(dat.corr$iso3)){
  tmp <- dat.corr[which(dat.corr$iso3 == cc), ]
  tmp <- tmp[order(tmp$date), ]
  tmp$EPI_CONFIRMED_DAILY <- c(0, diff(tmp$EPI_CONFIRMED))
  tmp$IRI_ALL_CUMMEAN <- dplyr::cummean(tmp$IRI_ALL)
  dat.corr2 <- rbind(dat.corr2, tmp)
}

dat.corr2 <- dat.corr2[! is.na(dat.corr2$IRI_ALL), ]
dat.corr2 <- dat.corr2[- which(dat.corr2$EPI_CONFIRMED == 0), ]

bin <- rep(0, nrow(dat.corr2))
bin[which(dat.corr2$EPI_CONFIRMED <= 2 )] <- 0
bin[which(3 <= dat.corr2$EPI_CONFIRMED & dat.corr2$EPI_CONFIRMED < 8)] <- 1
bin[which(8 <= dat.corr2$EPI_CONFIRMED & dat.corr2$EPI_CONFIRMED < 16)] <- 2
bin[which(16 <= dat.corr2$EPI_CONFIRMED & dat.corr2$EPI_CONFIRMED < 51)] <- 3
bin[which(51 <= dat.corr2$EPI_CONFIRMED & dat.corr2$EPI_CONFIRMED < 10001)] <- 4
bin[which(10001 <= dat.corr2$EPI_CONFIRMED & dat.corr2$EPI_CONFIRMED < 81000)] <- 5

dat.corr2$bin <- bin

labels.min <- dat.corr2 %>% group_by(bin) %>% summarise_at(vars(EPI_CONFIRMED), min)
labels.max <- dat.corr2 %>% group_by(bin) %>% summarise_at(vars(EPI_CONFIRMED), max)

 lab <- paste0(labels.min$EPI_CONFIRMED, '-', labels.max$EPI_CONFIRMED)
  lab[5:6] <- c('51-9999', '10000+')
ggplotly(ggplot(dat.corr2, aes(as.factor(bin), IRI_ALL_CUMMEAN)) +
 theme_bw() +
theme(panel.grid = element_blank(),
     legend.position = "none") +
geom_boxplot(aes(fill = as.numeric(bin)),
             size = 0.15,
             outlier.color = "grey70",
              color = "grey70", notch = TRUE) +
xlab("Cumulative Number of Reported Cases") +
ylab("IRI Cumulative Mean") +
scale_fill_viridis_c() +
scale_x_discrete(labels = lab) +
ggtitle("Cumulative IRI vs. Epidemic per index confirmed"))

Debriefing

The sample sizes of the reported cases for the groups in the range of 3-7, 8-15 are reasonably symmetric indicative of less variability in the analysis but, for groups 1-2, 16-50, 51-9999, 10k+ are left-skewed signifies some level of variability.

The medians of all the 6 groups are different benchmarking there is a significant difference between all the groups relative to the cumulative IRI taken into consideration.

Also to be noted that for case group 1-2 there are seemingly more outliers than the other 4 case groups but not for the case group 10k+. In addition, for case group 51-9999 the two outlier points above it’s upper fence appear overlapping and resembles an outlier cluster.

5.1.4 IRI score across continents

This plot shows IRI vs confirmed COVID-19 cases for Infodemics and Epidemics data aggregation by Country and at a border level categorized into continent.

x0 <- aggregate(TWI_VOLUME ~ iso3, dat.red, mean)
colnames(x0) <- c("Country", "Message.Volume") 
x1 <- aggregate( EPI_CONFIRMED ~ iso3, dat.red, max)
colnames(x1) <- c("Country", "Infected")
x2a <- aggregate( IRI_UNVERIFIED ~ iso3, dat.red, mean)
colnames(x2a) <- c("Country", "Risk Unverified")
x2b <- aggregate( IRI_VERIFIED ~ iso3, dat.red, mean)
colnames(x2b) <- c("Country", "Risk Verified")

tab <- merge(x0, x1, by = "Country")
tab <- merge(tab, x2a, by = "Country")
tab <- merge(tab, x2b, by = "Country")

tab$Info.Risk <- tab[, "Risk Verified"] + tab[, "Risk Unverified"]
tab$Continent <- countrycode(tab$Country, 'iso3c', 'continent')

tab <- tab[order(tab$Info.Risk),]
tab <- tab[! is.na(tab$Continent), ]
rownames(tab) <- NULL

Infect.thres <- 0
idxs <- which(tab$Infected > Infect.thres & ! tab$Country %in% c("CHN", "TWN", "IRN"))

ggplot(tab[idxs, ], aes(Info.Risk, Infected, color = Continent, size = Message.Volume))  + 
  theme_bw() + 
  theme(panel.grid = element_blank()) + 
  stat_smooth(method = 'lm', 
              linetype = "solid", 
              color = "red", 
              alpha = 0.2, 
              size = 0.25, 
              se = T) + 
  geom_point(alpha = 0.7) + 
  scale_color_npg() + 
  geom_text_repel(aes(label = Country), 
                  show.legend = F, 
                  seed = 786) + 
  scale_y_log10() + 
  geom_vline(xintercept = median(tab$Info.Risk[idxs], na.rm = T), 
             linetype = "dashed", 
             color = "#dadada") + 
  geom_hline(yintercept = median(tab$Infected[idxs], na.rm = T), 
             linetype = "dashed", 
             color = "#dadada") + 
  xlab("IRI")  + 
  ylab("Confirmed COVID19 Cases") +  
  stat_smooth(linetype = "dashed", 
              color = "black", 
              alpha = 0.2, 
              size = 0.35, 
              se = F ) + 
  labs(size = 'Volume')+
  ggtitle("Showing confirmed COVID-19 cases across countries and the IRI score")

Debriefing

It can be inferred that the IRI volume is very high in USA with approximately 383,210 cases. Also, we have focused to show this critical impact at top 5 continents around the globe. Moreover it has to be noted that the IRI score is very high Peru which is almost around 0.98 although the total infected cases are 11.

5.2 Trust in media sources

The most common media sources where people sought to get the information from we show the trust level for these.

data_12 <- read.spss("DATA_COVID19_TrustInformation.sav",
                     use.value.labels = FALSE,
                     to.data.frame = TRUE)
names(data_12) <- tolower(names(data_12))
data_12

This dataset has some of the significant data pertaining to the trust levels collected on a survey form as all categorical values. We plan to exploit this to do some regression analysis and draw conclusion for the trust in media and local authorities.

5.2.1 Trust analysis in local governments vs. global governments

ggplotly(ggplot(data_scatterplot_gov, aes(x=govglob, y=govloc)) +
  geom_point(col="blue", alpha = 0.9) + 
  geom_text(label=data_scatterplot_gov$country)+
  xlab("Trust in global governments")+
  ylab("Trust in country's government")+
  ggtitle("Plot for trust among citizens in Country's government for 12 different countries")+
  geom_smooth(method = "lm"))

Debriefing - Trust in Governments

It is very much evident that in early 2020 New Zealand was declared as a COVID-19 free country so we estimate the trust in its government is remarkably higher in comparison to other countries of the world. On the contrary, it modeled a low trust score in global governments.

At the same time trust score in Brazil’s local government is extremely lower but it shows a high trust in global governments.

Thus shows a strong and negative correlation among the trust in a particular country’s government and in global government.

5.2.2 Trust analysis in local scientists vs. global scientists

ggplotly(ggplot(data_scatterplot_scient, aes(x=scientglob, y=scientloc)) +
  geom_point(col="blue", alpha = 0.5) + 
  geom_text(label=data_scatterplot_scient$country)+
  xlab("Trust in scientists globally")+
  ylab("Trust in country's scientists")+ 
  ggtitle("Plot for trust among citizens in Country's scientists for 12 different countries")+
  geom_smooth(method = "lm"))

Debriefing - Trust in Scientists

For the same set of 12 different countries, we do a comparative analysis by plotting the trust in the scientists of their country vs. the trust in global scientist. From our data it is evident that for Italy the trust in local government are as low as 5.78 whereas globally it is just 5.31. For Brazil the local trust score is nearly 8.4 whereas the global trust score around 8.6.

This shows a strong and positive correlation among the trust in a particular country’s scientists and in global scientists.

5.2.3 Regression analysis (Trust scores for local authorities)

We perform this analysis and create 4 different models where we separate countries into popular country list and non-popular country list based on the conspiracy beliefs and perceived knowledge on mainstream online media and well known newspaper websites. We performed regression analysis using the lme4 and lavaan library for modeling mixed effects as in our predictor variables causes more than one random variability in the data.

library(lavaan)

myModel_gov <- '
feelinginformedZ ~ a1*trustgov_localZ
conspiracyZ ~ a2*trustgov_localZ
safetyprecautions_avgZ ~ b1*feelinginformedZ + b3*conspiracyZ + c1*trustgov_localZ
otheravoid_avgZ ~ b2*feelinginformedZ + b4*conspiracyZ + c2*trustgov_localZ
## indirects
indirect1 := a1 * b1
indirect2 := a2 * b3
indirect3 := a1 * b2
indirect4 := a2 * b4
## contrasts
con1 := (a1*b1) - (a2*b3)
con2 := (a1*b2) - (a2*b4)
## covariates
feelinginformedZ ~~ conspiracyZ
safetyprecautions_avgZ ~~ otheravoid_avgZ
'
library(lavaan)

myModel_sci <- '
feelinginformedZ ~ a1*trustscientistsZ
conspiracyZ ~ a2*trustscientistsZ
safetyprecautions_avgZ ~ b1*feelinginformedZ + b3*conspiracyZ + c1*trustscientistsZ
otheravoid_avgZ ~ b2*feelinginformedZ + b4*conspiracyZ + c2*trustscientistsZ
## indirects
indirect1 := a1 * b1
indirect2 := a2 * b3
indirect3 := a1 * b2
indirect4 := a2 * b4
## contrasts
con1 := (a1*b1) - (a2*b3)
con2 := (a1*b2) - (a2*b4)
## covariates
feelinginformedZ ~~ conspiracyZ
safetyprecautions_avgZ ~~ otheravoid_avgZ
'
fit <- sem(myModel_sci, data=meddata_sci, se = "bootstrap", bootstrap = 5000)
summary(fit, fit.measures=TRUE, standardize=TRUE, rsquare=TRUE)
## lavaan 0.6-8 ended normally after 16 iterations
## 
##   Estimator                                         ML
##   Optimization method                           NLMINB
##   Number of model parameters                        14
##                                                       
##                                                   Used       Total
##   Number of observations                          7430        7755
##                                                                   
## Model Test User Model:
##                                                       
##   Test statistic                                 0.000
##   Degrees of freedom                                 0
## 
## Model Test Baseline Model:
## 
##   Test statistic                              2989.336
##   Degrees of freedom                                10
##   P-value                                        0.000
## 
## User Model versus Baseline Model:
## 
##   Comparative Fit Index (CFI)                    1.000
##   Tucker-Lewis Index (TLI)                       1.000
## 
## Loglikelihood and Information Criteria:
## 
##   Loglikelihood user model (H0)             -40495.729
##   Loglikelihood unrestricted model (H1)     -40495.729
##                                                       
##   Akaike (AIC)                               81019.458
##   Bayesian (BIC)                             81116.244
##   Sample-size adjusted Bayesian (BIC)        81071.755
## 
## Root Mean Square Error of Approximation:
## 
##   RMSEA                                          0.000
##   90 Percent confidence interval - lower         0.000
##   90 Percent confidence interval - upper         0.000
##   P-value RMSEA <= 0.05                             NA
## 
## Standardized Root Mean Square Residual:
## 
##   SRMR                                           0.000
## 
## Parameter Estimates:
## 
##   Standard errors                            Bootstrap
##   Number of requested bootstrap draws             5000
##   Number of successful bootstrap draws            5000
## 
## Regressions:
##                            Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##   feelinginformedZ ~                                                   
##     trstscntZ (a1)            0.246    0.013   18.613    0.000    0.246
##   conspiracyZ ~                                                        
##     trstscntZ (a2)           -0.365    0.012  -29.724    0.000   -0.365
##   safetyprecautions_avgZ ~                                             
##     flngnfrmZ (b1)            0.099    0.013    7.864    0.000    0.099
##     consprcyZ (b3)            0.081    0.013    6.160    0.000    0.081
##     trstscntZ (c1)            0.115    0.015    7.593    0.000    0.115
##   otheravoid_avgZ ~                                                    
##     flngnfrmZ (b2)           -0.000    0.012   -0.032    0.975   -0.000
##     consprcyZ (b4)            0.212    0.014   15.498    0.000    0.212
##     trstscntZ (c2)            0.036    0.014    2.656    0.008    0.036
##   Std.all
##          
##     0.247
##          
##    -0.369
##          
##     0.099
##     0.080
##     0.115
##          
##    -0.000
##     0.210
##     0.036
## 
## Covariances:
##                             Estimate  Std.Err  z-value  P(>|z|)   Std.lv
##  .feelinginformedZ ~~                                                   
##    .conspiracyZ               -0.096    0.012   -8.031    0.000   -0.096
##  .safetyprecautions_avgZ ~~                                             
##    .otheravoid_vgZ             0.319    0.011   27.951    0.000    0.319
##   Std.all
##          
##    -0.108
##          
##     0.331
## 
## Variances:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##    .feelinginfrmdZ    0.926    0.016   58.278    0.000    0.926    0.939
##    .conspiracyZ       0.842    0.018   46.650    0.000    0.842    0.864
##    .sftyprctns_vgZ    0.971    0.019   51.802    0.000    0.971    0.975
##    .otheravoid_vgZ    0.956    0.014   66.616    0.000    0.956    0.960
## 
## R-Square:
##                    Estimate
##     feelinginfrmdZ    0.061
##     conspiracyZ       0.136
##     sftyprctns_vgZ    0.025
##     otheravoid_vgZ    0.040
## 
## Defined Parameters:
##                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
##     indirect1         0.024    0.003    7.225    0.000    0.024    0.024
##     indirect2        -0.030    0.005   -6.000    0.000   -0.030   -0.030
##     indirect3        -0.000    0.003   -0.032    0.975   -0.000   -0.000
##     indirect4        -0.077    0.006  -13.643    0.000   -0.077   -0.077
##     con1              0.054    0.006    8.656    0.000    0.054    0.054
##     con2              0.077    0.007   11.300    0.000    0.077    0.077
parameterEstimates(fit, boot.ci.type="bca.simple")

Debriefing

We use Structural Equation Models (SEM) to fit our models. We do this for 5000 bootstrap samples and use the adjusted bootstrap percentile (BCa) interval for correcting the bias and skewness in the distribution of bootstrap estimates. Followed that we calculated the correlation of trust in media using Pearson’s product-moment correlation. We use certain metrics like Inter-class Correlation Coefficient (ICC), AIC and BIC to check the model complexity. We perform this by taking the grouping factor as the residential country countryres.

Based on the grouping factor as countryres we categorize them as popular country list populist and non-popular country list non-populist. On these we analyze for the conspiracy belief and the perceived knowledge from each of the (online) information media sources. Followed by that, we perform a multiple regression test and plot the regression summaries. The information sources that we consider are WHO, Facebook, Twitter, Instagram, National Health Institutes, National government and Newspaper websites.

5.2.4 Regression Analysis (trust scores in media)

library(lme4)
library(lmerTest)

table_socmed_knowl_pop = lmer(feelinginformedZ ~ trust_fbZ + trust_igZ + trust_govZ  + trust_twZ + trust_nhsZ + trust_newspaperZ + trust_whoZ+ (1 | countryres), 
                              data = data_pop)
summary(table_socmed_knowl_pop)
## Linear mixed model fit by REML. t-tests use Satterthwaite's method [
## lmerModLmerTest]
## Formula: feelinginformedZ ~ trust_fbZ + trust_igZ + trust_govZ + trust_twZ +  
##     trust_nhsZ + trust_newspaperZ + trust_whoZ + (1 | countryres)
##    Data: data_pop
## 
## REML criterion at convergence: 11311.5
## 
## Scaled residuals: 
##     Min      1Q  Median      3Q     Max 
## -3.8872 -0.6168  0.0428  0.7562  2.1192 
## 
## Random effects:
##  Groups     Name        Variance Std.Dev.
##  countryres (Intercept) 0.06116  0.2473  
##  Residual               0.94733  0.9733  
## Number of obs: 4045, groups:  countryres, 3
## 
## Fixed effects:
##                    Estimate Std. Error         df t value Pr(>|t|)    
## (Intercept)         0.10372    0.14385    1.98052   0.721  0.54641    
## trust_fbZ          -0.03983    0.01843 4035.77480  -2.161  0.03072 *  
## trust_igZ          -0.02643    0.01861 4036.43024  -1.420  0.15561    
## trust_govZ          0.04286    0.01838 4033.19553   2.332  0.01973 *  
## trust_twZ           0.06639    0.01806 4035.20279   3.676  0.00024 ***
## trust_nhsZ          0.05689    0.02149 4036.22987   2.648  0.00814 ** 
## trust_newspaperZ    0.13605    0.01689 4027.61429   8.056 1.03e-15 ***
## trust_whoZ         -0.01639    0.02023 4035.64716  -0.810  0.41787    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Correlation of Fixed Effects:
##             (Intr) trst_fZ trust_gZ trst_gvZ trst_tZ trst_nhZ trst_nwZ
## trust_fbZ    0.017                                                    
## trust_igZ   -0.013 -0.367                                             
## trust_govZ   0.033 -0.013   0.002                                     
## trust_twZ   -0.002 -0.097  -0.363    0.009                            
## trust_nhsZ  -0.005 -0.011  -0.033   -0.403    0.055                   
## trst_nwsppZ -0.015 -0.086   0.005   -0.163   -0.191   0.016           
## trust_whoZ  -0.013 -0.057  -0.001    0.038    0.018  -0.568   -0.206
confint(table_socmed_knowl_pop)
##                         2.5 %      97.5 %
## .sig01            0.102160170  0.59947043
## .sigma            0.951647439  0.99405753
## (Intercept)      -0.223085552  0.43175753
## trust_fbZ        -0.076048632 -0.00385625
## trust_igZ        -0.062555431  0.01042703
## trust_govZ        0.006453834  0.07850765
## trust_twZ         0.030979457  0.10173591
## trust_nhsZ        0.014585726  0.09878340
## trust_newspaperZ  0.103317010  0.16954664
## trust_whoZ       -0.055639001  0.02371212
r2(table_socmed_knowl_pop)
## # R2 for Mixed Models
## 
##   Conditional R2: 0.092
##      Marginal R2: 0.034
icc(table_socmed_knowl_pop)
## # Intraclass Correlation Coefficient
## 
##      Adjusted ICC: 0.061
##   Conditional ICC: 0.059
AIC(table_socmed_knowl_pop)
## [1] 11331.55
BIC(table_socmed_knowl_pop)
## [1] 11394.6

Debriefing

We observe that every 1 unit increase in the trust in Newspaper websites (trust_newspaperZ) is associated with a 0.13 increase in trust of people being informed (feelinginformedZ) given all other factors held constant for popular countries as compared to other media sources that they participate for.

Our fitted model shows that it is a very highly complex model by realizing the AIC and BIC scores. Also, from our mixed effect model from the Adjusted ICC value it can be said that around 6.1% proportion of variation in the model is explained by the grouping factor countryres. We also look at the Conditional \(R^2\) which explains around 9.2% of both the fixed and random effects in our model.

5.2.5 Trust graph

We depict a so-called “trust graph” from the above regression analysis to compare how people interact with the modern online social media and we show the trust levels based on Conspiracy beliefs and Perceived knowledge.

multiregressiontest <- plot_summs(table_socmed_conspir_non, 
                                  table_socmed_conspir_pop, 
                                  table_socmed_knowl_non, 
                                  table_socmed_knowl_pop, 
                                  omit.coefs = c("(Intercept)", 
                                                 "Intercept"),
                                  model.names = c("Conspiracy belief (non-populist)",
                                                  "Conspiracy belief (populist)",
                                                  "Perceived Knowledge(non-populist)",
                                                  "Perceived Knowledge (populist)"
                                                  ))


multi_plot_con <- multiregressiontest  + theme_apa() 

multi_plot_con + 
  theme(legend.position="top") + 
  scale_y_discrete(labels=c("World Health Organisation",
                                                                           "Newspaper websites",
                                                                           "National health institutions",
                                                                           "Twitter", 
                                                                           "National government", 
                                                                           "Instagram", 
                                                                           "Facebook"
))+
  xlab("Estimate")+
  ylab("Information sources")+
  ggtitle("Trust in information sources during the COVID-19 among citizens")

Debriefing

We interestingly find that amount of perceived knowledge regarding the COVID-19 information in popular country list comes from Newspaper websites which is estimated to be in the range of [0.1, 0.16]. Whereas this range is more for non-popular country list is in the rage of [0.17, 0.27] and the trust score comes from the National government.

People believe in conspiracy theories for a variety of reasons—to explain random events, to feel special or unique, or for a sense of social belonging, to name a few. It is also seemingly interesting that in our analysis the measure of Conspiracy belief in non-popular countries come from Facebook and it is estimated to be [0.14, 0.22]. And this range is lower [0.8, 0.15] from the same source for popular countries.

From the data the conspiracy belief levels from W.H.O. does have a negative estimate for both popular and non-popular countries.

6 Final Analysis

6.1 Global survey

The countries included in the analyses, and the respective sample size, are: Austria (279), Belgium (557), Bulgaria (4,538), Croatia (2,909), Cyprus (34), Czech Republic (1,344), Denmark (10,327), Estonia (34), Finland (20,810), France (12,446), Germany (1,271), Greece (628), Hungary (1,427), Ireland (209), Italy (1,370), Latvia (22), Lithuania (8,056), Luxembourg (59), Malta (21), Netherlands (1256), Poland (3,052), Portugal (827), Romania (189), Slovakia (597), Slovenia (21), Spain (554), Sweden (2,733).

Respondents were 74.18 % female, 24.63 % male. The remaining respondents answered ‘other’ or did not provide an answer.

The majority of respondents (67.16%) were in full-time, part-time work or self-employed, 16.06% were either unemployed or retired, 16.79% were students.

The age of the respondents ranged from 18 to 110, with a median age of 38.

Individuals’ general stress levels were measured using an established ten-item scale developed by psychologists (Cohen 1983). This scale measures participants’ stress during the last week by using indicators of stress responses, for instance, perceived lack of control over events, pressure from mounting difficulties and feeling upset about unexpected changes. Scores are considered moderate above 2.4, and high above 3.7. Levels of stress were moderate or lower in many countries. Poland and Portugal reported the highest levels of stress in Europe, and Denmark and the Netherlands the lowest.

Levels of stress remained fairly stable over the middle of April, with a negligible decrease between April 4th and April 13th. Overall levels of stress remained higher in women compared to men throughout the period under consideration.

Participants were asked to indicate the extent to which a range of different factors represented a source of distress during the COVID-19 health crisis. Specifically, participants indicated their disagreement or agreement with how much each factor from a list represented a source of distress ( 1 = Strongly Disagree, 6 = Strongly Agree ). Results indicated that people were on average concerned with the state of the national economy. Economic considerations were followed closely by health-related risks , such as the risks of being hospitalized and of contracting the new disease .

Participants were asked how much they trusted six key institutions , in relation to the COVID-19 emergency (on a scale from 1 = not at all to 10 = completely). Specifically, participants were asked about their trust towards the health care systems , the World Health Organization (W.H.O.), the national governments’ efforts to tackle the COVID-19 , the Police , the civil service and the national government.

Overall, it was reported only medium levels of trust, with the highest levels of trust for their countries’ healthcare system and the WHO. Trust towards the national government was relatively lower, compared to the other institutions examined.

6.2 Twitter

6.3 Infodemics - We need to flatten this curve

Just as we need to flatten the Covid-19 curve we must also tackle the infodemic curve. Just as with Covid-19 we must attack the curve on two fronts (suppress the contagion and increase our capacity to deal with the surge of information that is coming our way).

A comparative correlation for the trust in media with factors concerning the following were taken into account during our regression analysis:

  • Whether COVID-19 is a naturally occurring virus or an artificially made (e.g. lab created)? (virus_natart)

From our analsyis for virus_natart and media_underover the true correlation is around 13% which means media has a very high role for overhyping over this question.

  • How is the media in general is reporting on the COVID-19 situation? Underplaying or Over-hyping or Just right. (media_underover)

We compare our scores at two places one with virus_natart where they are overhyped over this news and for feelinginformed_avg they likely to be less informed since they share a negative correlation.

  • How often a contradictory news is found and turned out to be fake news? (fakenews)

Also, for virus_natart and fakenews with only just 3.6% it is difficult to verify the trueness of this question which is very uncertain.

  • How informed (feelinginformed_avg) are the citizens feeling about:

    • The risk of contracting COVID-19
    • Symptoms of COVID-19
    • How COVID-19 virus spreads?
    • How to prevent COVID-19 from spreading?
    • Treatment of COVID-19

We have analyzed both of these with respect to fakenews and media_underover for the former a very low correlation is observed meaning atleast by slightly the average number of people are convinced by fake news over social media Whereas, for the latter a negative correlation to this observed meaning an average number of people are likely to be less informed by the media tacit.

To our readers we therefore request to abide to the following in the near future to make sure perform the following:

  • Fact check to alert yourself what is currently going around,
  • Stick to trusted sources,
  • Do not forward without checking the authenticity of messages,
  • Increase supply of data by engaging regularly and meaningfully on the platforms that people are already using.

Glossary

Infodemic

Infodemic is a portmanteau of “information” and “epidemic” that typically refers to a rapid and far-reaching spread of both accurate and inaccurate information about something, such as a disease. As facts, rumors, and fears mix and disperse, it becomes difficult to learn essential information about an issue.

IRI

Infodemic Risk Index; likelihood that a user receives messages pointing to potentially misleading sources. This index quantifies if and how user are exposed to circulating information.

References

Andreas Lieberoth, Sabrina Stoeckli, Jesper Rasmussen. 2020. “COVIDiSTRESS Global Survey Network (2020 , March 30).” https://doi.org/10.17605/OSF.IO/Z39US.
Cohen, Kamarck, S. 1983. “Journal of Health and Social Behavior, 24(4), 385-396. A Global Measure of Perceived Stress.” https://doi.org/10.2307/2136404.
Matošević, Goran, and Vanja Bevanda. 2020. “Sentiment Analysis of Tweets about COVID-19 Disease During Pandemic,” 1290–95. https://doi.org/10.23919/MIPRO48935.2020.9245176.
Mulukom, Valerie van. 2021. “The Role of Trust and Information During the COVID-19 Pandemic and Infodemic,” May. https://doi.org/10.17605/OSF.IO/GFWBQ.
R. Gallotti, F. Valle, N. Castaldo, and M. De Domenico. 2021. “Covid19 Infodemics Observatory (2020),” May. https://doi.org/10.17605/OSF.IO/N6UPX.
Travaglino, Lieberoth, G. A. 2020. “How Is Covid19 Affecting Europeans’ Lives?. Report of the COVIDiSTRESS Global Survey.” https://doi.org/10.13140/RG.2.2.30558.59209.
Vijay, Tanmay, Ayan Chawla, Balan Dhanka, and Purnendu Karmakar. 2020. “Sentiment Analysis on COVID-19 Twitter Data,” 1–7. https://doi.org/10.1109/ICRAIE51050.2020.9358301.